Goto

Collaborating Authors

 cross-lingual ability


Semantic Pivots Enable Cross-Lingual Transfer in Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) demonstrate remarkable ability in cross-lingual tasks. Understanding how LLMs acquire this ability is crucial for their interpretability. To quantify the cross-lingual ability of LLMs accurately, we propose a Word-Level Cross-Lingual Translation Task. To find how LLMs learn cross-lingual ability, we trace the outputs of LLMs' intermediate layers in the word translation task. We identify and distinguish two distinct behaviors in the forward pass of LLMs: co-occurrence behavior and semantic pivot behavior. We attribute LLMs' two distinct behaviors to the co-occurrence frequency of words and find the semantic pivot from the pre-training dataset. Finally, to apply our findings to improve the cross-lingual ability of LLMs, we reconstruct a semantic pivot-aware pre-training dataset using documents with a high proportion of semantic pivots. Our experiments validate the effectiveness of our approach in enhancing cross-lingual ability. Our research contributes insights into the interpretability of LLMs and offers a method for improving LLMs' cross-lingual ability.


Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

arXiv.org Artificial Intelligence

Recent advances in training multilingual language models on large datasets seem to have shown promising results in knowledge transfer across languages and achieve high performance on downstream tasks. However, we question to what extent the current evaluation benchmarks and setups accurately measure zero-shot cross-lingual knowledge transfer. In this work, we challenge the assumption that high zero-shot performance on target tasks reflects high cross-lingual ability by introducing more challenging setups involving instances with multiple languages. Through extensive experiments and analysis, we show that the observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge, such as task- and surface-level knowledge. More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages. Our findings highlight the overlooked drawbacks of existing cross-lingual test data and evaluation setups, calling for a more nuanced understanding of the cross-lingual capabilities of multilingual models.


Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations

arXiv.org Artificial Intelligence

The language ability of Large Language Models (LLMs) is often unbalanced towards English because of the imbalance in the distribution of the pre-training data. This disparity is demanded in further fine-tuning and affecting the cross-lingual abilities of LLMs. In this paper, we propose to empower Instructiontuned LLMs (It-LLMs) in languages other than English by building semantic alignment between them. Hence, we propose CrossAlpaca, an It-LLM with cross-lingual instruction-following and Translation-following demonstrations to improve semantic alignment between languages. We validate our approach on the multilingual Question Answering (QA) benchmarks XQUAD and MLQA and adapted versions of MMLU and BBH. Our models, tested over six different languages, outperform the It-LLMs tuned on monolingual data. The final results show that instruction tuning on non-English data is not enough and that semantic alignment can be further improved by Translation-following demonstrations.


Cross-Lingual Ability of Multilingual BERT: An Empirical Study

arXiv.org Artificial Intelligence

Recent work has exhibited the surprising cross-lingual abi lities of multilingual BERT ( M-BERT) - surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a compr ehensive study of the contribution of different components in M-BERT to its cross-lingual ability. The experimental study is done in the context of three typologically different languages - Spani sh, Hindi, and Russian - and using two conceptually different NLP tasks, textual en tailment and named entity recognition. Among our key conclusions is the fact th at the lexical overlap between languages plays a negligible role in the cross-ling ual success, while the depth of the network is an integral part of it. Embeddings of natural language text via unsupervised learn ing, coupled with sufficient supervised training data, have been ubiquitous in NLP in recent years an d have shown success in a wide range of monolingual NLP tasks, mostly in English. Training models f or other languages have been shown more difficult, and recent approaches relied on bilingual em beddings that allowed the transfer of supervision in high resource languages like English to mode ls in lower resource languages; however, inducing these bilingual embeddings required some level of supervision (Upadhyay et al., 2016). Not only the model is contextual, but its training also requires no supervisio n - no alignment between the languages is done. Nevertheless, and despite being trained with no exp licit cross-lingual objective, M-BERT produces a representation that seems to generalize well acr oss languages for a variety of downstream tasks (Wu & Dredze, 2019). In this work, we attempt to develop an understanding of the su ccess of M-BERT.